Conversation
# Conflicts: # openml/runs/functions.py
reimplemented extract parameters from run (based on sklearn converter)
Development into fix169
Develop into fix169 (II)
Codecov Report
@@ Coverage Diff @@
## develop #241 +/- ##
===========================================
+ Coverage 88.36% 88.82% +0.46%
===========================================
Files 23 24 +1
Lines 1960 2032 +72
===========================================
+ Hits 1732 1805 +73
+ Misses 228 227 -1
Continue to review full report at Codecov.
|
mfeurer
left a comment
There was a problem hiding this comment.
Sorry, I only got through half of this PR yet. Will continue tomorrow.
| for param_name in sorted(model_params): | ||
| if 'random_state' in param_name: | ||
| currentValue = model_params[param_name] | ||
| # important to draw the value at this point (and not in the if statement) |
There was a problem hiding this comment.
Hm, could you explain why? It's not clear to me from this.
There was a problem hiding this comment.
Added description.
I'm not sure if we really need this, but seems nice property to respect.
| return False | ||
|
|
||
| def _get_seeded_model(model, seed=None): | ||
| '''Sets all the non-seeded components of a model with a seed. |
There was a problem hiding this comment.
You should mention the restriction that one cannot use a random state in the pipelines.
There was a problem hiding this comment.
I'm not sure. One could argue that that is a restriction of the run_tasks function, as that is the function the user interacts with. Furthermore, that function is responsible for the check.
This function only adds seeds to unseeded models
There was a problem hiding this comment.
But it can raise an exception:
import numpy as np
import sklearn.ensemble
import openml
rf = sklearn.ensemble.RandomForestClassifier(
random_state=np.random.RandomState(1))
openml.runs.functions._get_seeded_model(rf, 5)
But you're right, it should be documented in the run_tasks() function.
There was a problem hiding this comment.
Mea culpa, i will add it
|
|
||
| return run | ||
|
|
||
| def initialize_model_from_run(run_id): |
There was a problem hiding this comment.
This is neither used nor tested.
There was a problem hiding this comment.
Agreed, I added tests.
mfeurer
left a comment
There was a problem hiding this comment.
I'm mostly through, I olny need to understand why test_existing_setup_exists has to use a different classifier.
|
|
||
| Parameters | ||
| ---------- | ||
| flow_id : int |
There was a problem hiding this comment.
It looks like you copied the docstring from the object above and didn't adapt it.
There was a problem hiding this comment.
I loove copy/pasting. fixed it.
| _current['oml:component'] = main_id | ||
| else: | ||
| raise ValueError("parameter %s not in flow description of flow %s" %(param,flow.name)) | ||
| _current['oml:component'] = _param_dict[_flow.name] |
There was a problem hiding this comment.
Why is it once an ID, and once a name?
|
|
||
| @staticmethod | ||
| def _parse_parameters(model, flow): | ||
| def _parse_parameters(model, server_flow): |
There was a problem hiding this comment.
Looking at this again I'm actually surprised that this is not called run_task, but only when publishing. But maybe this should be its own issue/PR.
There was a problem hiding this comment.
Can you elaborate a bit? I don really understand which / why
There was a problem hiding this comment.
Sorry, I meant "that this is not called IN run_task"
| current = openml.setups.get_setup(setups[idx]) | ||
| assert current.flow_id > 0 | ||
| if num_params[idx] == 0: | ||
| assert current.parameters is None |
There was a problem hiding this comment.
Could you please use self.assert in the unit tests? It gives nicer outputs.
There was a problem hiding this comment.
You mean self.asserts()?
Sure.
|
test_existing_setup_exists makes use of sentinels. That gives us a general problem when serializing a flow (or setup) and comparing it to one on the server, as the flow (setup) serialization is not aware of the (and which) sentinel string was used. For that reason, I slightly generalized one of the functions, such that it does not rely on name mappings for the main flow (boolean flag that indicates a function call on depth 1). This way, we do not have to remove the sentinels and we can in this context still test flows without subflows. |
| assert current.flow_id > 0 | ||
| if num_params[idx] == 0: | ||
| self.asserts(current.parameters is None) | ||
| self.assertTrue(current.parameters is None) |
There was a problem hiding this comment.
Sorry, I should have been more specific. The correct one here would be self.assertIsNone.
| self.assertTrue(current.parameters is None) | ||
| else: | ||
| self.asserts(len(current.parameters) == num_params[idx]) | ||
| self.assertTrue(len(current.parameters) == num_params[idx]) |
There was a problem hiding this comment.
Sorry, I should have been more specific. The correct one here would be self.assertEqual.
|
Thanks for the explanation of the sentinels. Maybe it would be good to add that to the actual unit test. What I'm wondering right now is if this is save with respect to running the unit tests in parallel (which happens on travis-ci). You assume: it could happen that it was already run. Maybe adding a sentinel to a hyperparameter? Or is there some other way of making a setup unique? Once we figured out this part, I think we can merge this PR before it becomes too big and create new PRs for the missing functionality. |
|
I had already added this in the comment: Also, the other comment was slightly confusing. I changed this to: Should be fine now? Let's merge |
No description provided.